What is Ward's Method?

Ward’s method (a.k.a. Minimum variance method or Ward’s Minimum Variance Clustering Method) is an alternative to single-link clustering. Popular in fields like linguistics, it’s liked because it usually creates compact, even-sized clusters. Like most other clustering methods, Ward’s method is computationally intensive. However, Ward’s has significantly fewer computations than other methods. The drawback is this usually results in less than optimal clusters. That said, the resulting clusters are usually good enough for most purposes.

Like other clustering methods, Ward’s method starts with n clusters, each containing a single object. These n clusters are combined to make one cluster containing all objects. At each step, the process makes a new cluster that minimizes variance, measured by an index called E (also called the sum of squares index).

What is Partitioning Around Medoids (PAM)?

K-medoids or partitioning around medoids (PAM) algorithm is a clustering algorithm reminiscent of the k-means algorithm. Both the k-means and k-medoids algorithms are partitional (breaking the dataset up into groups) and both attempt to minimize the distance between points labeled to be in a cluster and a point designated as the center of that cluster.

Sequence descriptives

##        [-> 1] [-> 2] [-> 3] [-> 4] [-> 5]
## [1 ->]   0.81   0.09   0.04   0.05   0.02
## [2 ->]   0.00   0.89   0.04   0.05   0.01
## [3 ->]   0.00   0.02   0.90   0.06   0.02
## [4 ->]   0.00   0.01   0.02   0.94   0.02
## [5 ->]   0.00   0.01   0.02   0.07   0.90

Clustering and dendrogram

Three Clusters

cl1.3 <- cutree(clusterward1, k = 3)
cl1.3fac <- factor(cl1.3, labels = paste("Type", 1:3))

# Number of sequences in each cluster
table(cl1.3)
## cl1.3
##    1    2    3 
## 2244  950  306
# Seqrplot displays a reduced, non redundant set of representative sequences extracted from the provided state sequence object and sorted according to a representativeness criterion
seqrplot(sample, diss = dist.om1, group = cl1.3fac,border = NA)

# Seqdplot represents the sequence of the cross-sectional state frequencies by position (time point)
seqdplot(sample, group = cl1.3fac, border = NA)

# Seqfplot displays the most frequent sequences, each one with an horizontal stack bar of its successive states
seqfplot(sample, group = cl1.3fac, border = NA)

# Seqmtplot displays the mean time spent in each state
seqmtplot(sample, group = cl1.3fac, border = NA)

# Seqhtplot displays the evolution over positions of the cross-sectional entropies ( entropy is 0 when all cases are in the same state and is maximal when the same proportion of cases are in each state; the entropy can be seen as a measure of the diversity of states observed at the considered time point)
seqHtplot(sample, group = cl1.3fac, border = NA)

Four Clusters

cl1.4 <- cutree(clusterward1, k = 4)
cl1.4fac <- factor(cl1.4, labels = paste("Type", 1:4))

table(cl1.4)
## cl1.4
##    1    2    3    4 
## 1936  950  308  306
seqrplot(sample, diss = dist.om1, group = cl1.4fac,border = NA)

seqdplot(sample, group = cl1.4fac, border = NA)

seqfplot(sample, group = cl1.4fac, border = NA)

seqmtplot(sample, group = cl1.4fac, border = NA)

seqHtplot(sample, group = cl1.4fac, border = NA)

Five Clusters

cl1.5 <- cutree(clusterward1, k = 5)
cl1.5fac <- factor(cl1.5, labels = paste("Type", 1:5))

table(cl1.5)
## cl1.5
##    1    2    3    4    5 
##  540 1396  950  308  306
seqrplot(sample, diss = dist.om1, group = cl1.5fac,border = NA)

seqdplot(sample, group = cl1.5fac, border = NA)

seqfplot(sample, group = cl1.5fac, border = NA)

seqmtplot(sample, group = cl1.5fac, border = NA)

seqHtplot(sample, group = cl1.5fac, border = NA)

Six Clusters

cl1.6 <- cutree(clusterward1, k = 6)
cl1.6fac <- factor(cl1.6, labels = paste("Type", 1:6))

table(cl1.6)
## cl1.6
##    1    2    3    4    5    6 
##  540 1396  488  462  308  306
seqrplot(sample, diss = dist.om1, group = cl1.6fac,border = NA)

seqdplot(sample, group = cl1.6fac, border = NA)

seqfplot(sample, group = cl1.6fac, border = NA)

seqmtplot(sample, group = cl1.6fac, border = NA)

seqHtplot(sample, group = cl1.6fac, border = NA)

Seven Clusters

cl1.7 <- cutree(clusterward1, k = 7)
cl1.7fac <- factor(cl1.7, labels = paste("Type", 1:7))

table(cl1.7)
## cl1.7
##    1    2    3    4    5    6    7 
##  540 1226  488  462  308  306  170
seqrplot(sample, diss = dist.om1, group = cl1.7fac,border = NA)

seqdplot(sample, group = cl1.7fac, border = NA)

seqfplot(sample, group = cl1.7fac, border = NA)

seqmtplot(sample, group = cl1.7fac, border = NA)

seqHtplot(sample, group = cl1.7fac, border = NA)

Eight Clusters

par(mar= c(1, 1, 1, 1))
cl1.8 <- cutree(clusterward1, k = 8)
cl1.8fac <- factor(cl1.8, labels = paste("Type", 1:8))

table(cl1.8)
## cl1.8
##    1    2    3    4    5    6    7    8 
##  306 1226  488  462  234  308  306  170
seqrplot(sample, diss = dist.om1, group = cl1.8fac,border = NA)

seqdplot(sample, group = cl1.8fac, border = NA)

seqfplot(sample, group = cl1.8fac, border = NA)

seqmtplot(sample, group = cl1.8fac, border = NA)

seqHtplot(sample, group = cl1.8fac, border = NA)